In this paper, we study the \underline{R}obust \underline{o}ptimization for \underline{se}quence \underline{Net}worked \underline{s}ubmodular maximization (RoseNets) problem. We interweave the robust optimization with the sequence networked submodular maximization. The elements are connected by a directed acyclic graph and the objective function is not submodular on the elements but on the edges in the graph. Under such networked submodular scenario, the impact of removing an element from a sequence depends both on its position in the sequence and in the network. This makes the existing robust algorithms inapplicable. In this paper, we take the first step to study the RoseNets problem. We design a robust greedy algorithm, which is robust against the removal of an arbitrary subset of the selected elements. The approximation ratio of the algorithm depends both on the number of the removed elements and the network topology. We further conduct experiments on real applications of recommendation and link prediction. The experimental results demonstrate the effectiveness of the proposed algorithm.
translated by 谷歌翻译
Disentangled representation learning remains challenging as ground truth factors of variation do not naturally exist. To address this, we present Vocabulary Disentanglement Retrieval~(VDR), a simple yet effective retrieval-based disentanglement framework that leverages nature language as distant supervision. Our approach is built upon the widely-used bi-encoder architecture with disentanglement heads and is trained on data-text pairs that are readily available on the web or in existing datasets. This makes our approach task- and modality-agnostic with potential for a wide range of downstream applications. We conduct experiments on 16 datasets in both text-to-text and cross-modal scenarios and evaluate VDR in a zero-shot setting. With the incorporation of disentanglement heads and a minor increase in parameters, VDR achieves significant improvements over the base retriever it is built upon, with a 9% higher on NDCG@10 scores in zero-shot text-to-text retrieval and an average of 13% higher recall in cross-modal retrieval. In comparison to other baselines, VDR outperforms them in most tasks, while also improving explainability and efficiency.
translated by 谷歌翻译
Vision Transformers (ViTs) outperforms convolutional neural networks (CNNs) in several vision tasks with its global modeling capabilities. However, ViT lacks the inductive bias inherent to convolution making it require a large amount of data for training. This results in ViT not performing as well as CNNs on small datasets like medicine and science. We experimentally found that masked autoencoders (MAE) can make the transformer focus more on the image itself, thus alleviating the data-hungry issue of ViT to some extent. Yet the current MAE model is too complex resulting in over-fitting problems on small datasets. This leads to a gap between MAEs trained on small datasets and advanced CNNs models still. Therefore, we investigated how to reduce the decoder complexity in MAE and found a more suitable architectural configuration for it with small datasets. Besides, we additionally designed a location prediction task and a contrastive learning task to introduce localization and invariance characteristics for MAE. Our contrastive learning task not only enables the model to learn high-level visual information but also allows the training of MAE's class token. This is something that most MAE improvement efforts do not consider. Extensive experiments have shown that our method shows state-of-the-art performance on standard small datasets as well as medical datasets with few samples compared to the current popular masked image modeling (MIM) and vision transformers for small datasets.The code and models are available at https://github.com/Talented-Q/SDMAE.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
图形预训练策略一直在图形挖掘社区吸引人们的注意力,因为它们在没有任何标签信息的情况下在参数化图形神经网络(GNN)方面的灵活性。关键思想在于通过预测从输入图中提取的掩蔽图信号来编码有价值的信息。为了平衡各种图形信号的重要性(例如节点,边缘,子图),现有方法主要是通过引入超参数来重新进行图形信号的重要性来进行手工设计的。然而,人类对亚最佳高参数的干预通常会注入额外的偏见,并在下游应用中降低了概括性能。本文从新的角度解决了这些局限性,即为预培训GNN提供课程。我们提出了一个名为Mentorgnn的端到端模型,该模型旨在监督具有不同结构和不同特征空间的图表的GNN的预训练过程。为了理解不同粒度的异质图信号,我们提出了一种课程学习范式,该课程自动重新贴出图形信号,以确保对目标域进行良好的概括。此外,我们通过在预先训练的GNN的概括误差上得出自然且可解释的上限,从而对关系数据(即图形)的域自适应问题(即图形)发出了新的启示。有关大量真实图的广泛实验验证并验证了Mentorgnn的性能。
translated by 谷歌翻译
我们考虑了一个固定的销售库存控制系统,该系统在计划中$ t $上有交货时间$ l $。供应不确定,并且是订单数量(由于随机产量/容量等)的函数。我们的目标是最大程度地减少$ t $ - 周期成本,即使在已知的需求和供应分布下,该问题也已知在计算上是棘手的。在本文中,我们假设需求和供应分布均未知并开发出一种计算高效的在线学习算法。我们表明,我们的算法在$ O(l+\ sqrt {t}} $时,我们的算法(即我们的算法成本与最佳政策的成本之间的性能差异) (t)$。我们这样做1)显示我们的算法成本最多,最多$ o(l+\ sqrt {t})$对于任何$ l \ geq 0 $,与完整信息下的最佳恒定订单策略相比以及广泛使用的算法)和2)利用其现有文献的已知绩效保证。据我们所知,有限的样本$ O(\ sqrt {t})$($ l $中的多项式)遗憾的是,在在线库存控制文献中以前不知道针对最佳策略的基准标记。这个学习问题的一个关键挑战是,可以审查需求和供应数据。因此,只能观察到截短的值。我们通过证明在订单数量$ q^2 $中生成的数据允许我们模拟全部$ q^2 $的性能,还可以模拟所有$ q^1 $,从而避免了这一挑战。 $,即使在数据审查下,也可以获取足够信息的关键观察。通过建立高概率耦合参数,我们能够在有限的时间范围内评估和比较其稳定状态下不同顺序策略的性能。由于该问题缺乏凸度,因此我们开发了一种活跃的消除方法,可以适应地排除次优的解决方案。
translated by 谷歌翻译
通常通过过去的选择来告知机器学习中的评估,例如要使用哪些数据集或指标。该标准化可以使用排行榜对平等基础进行比较,但是随着出现更好的替代方案,评估选择变得不佳。这个问题在自然语言生成中尤其相关,该语言需要不断改善的数据集,指标和人类评估以提出确定性的主张。为了使遵循最佳模型评估实践更加容易,我们介绍了GEMV2。新版本的一代,评估和指标基准为数据集,模型和指标开发人员提供了模块化基础架构,以使彼此受益。GEMV2支持40种记录的数据集中51种语言。所有数据集的模型都可以在线评估,我们的交互式数据卡创建和渲染工具使得在Living Benchmark中添加新数据集变得更加容易。
translated by 谷歌翻译
随着视觉前训练的成功,我们目睹了最先进的方式,以多模式的理解和产生推动。但是,当前的预训练范式不能一次靶向所有模式(例如,文本生成和图像生成),或者需要多重设计良好的任务,从而显着限制可伸缩性。我们证明,可以通过文本和图像序列的前缀语言建模目标学习统一的模态模型。得益于简单但功能强大的预训练范式,我们提出的模型Davinci非常易于训练,可扩展到巨大的数据,并且可以适应跨模态(语言 /视觉 /视觉+语言)的各种下游任务(类型)(理解) / generation)和设置(例如,零射,微调,线性评估)具有单个统一体系结构。达文奇(Davinci)在26个理解 /发电任务的广泛范围内实现了竞争性能,并且在大多数任务上都超过了以前的统一视力语言模型,包括Imagenet分类(+1.6%),VQAV2(+1.4%)(+1.4%),可可标题生成(Bleu@@@@@ 4 +1.1%,苹果酒 +1.5%)和可可图像生成( +0.9%,FID -1.0%),在可比的模型和数据量表处。此外,我们通过在异质和广泛的分布覆盖范围内报告不同尺度的量表上的性能,为将来的研究提供了明确的基准。我们的结果建立了新的,更强的基线,以便将来在不同的数据量表上进行比较,并阐明了更广泛地比较VLP模型的困难。
translated by 谷歌翻译
聚类是一项基本的机器学习任务,在文献中已广泛研究。经典聚类方法遵循以下假设:数据通过各种表示的学习技术表示为矢量化形式的特征。随着数据变得越来越复杂和复杂,浅(传统)聚类方法无法再处理高维数据类型。随着深度学习的巨大成功,尤其是深度无监督的学习,在过去的十年中,已经提出了许多具有深层建筑的代表性学习技术。最近,已经提出了深层聚类的概念,即共同优化表示的学习和聚类,因此引起了社区的日益关注。深度学习在聚类中的巨大成功,最基本的机器学习任务之一以及该方向的最新进展的巨大成功所激发。 - 艺术方法。我们总结了深度聚类的基本组成部分,并通过设计深度表示学习和聚类之间的交互方式对现有方法进行了分类。此外,该调查还提供了流行的基准数据集,评估指标和开源实现,以清楚地说明各种实验设置。最后但并非最不重要的一点是,我们讨论了深度聚类的实际应用,并提出了应有的挑战性主题,应将进一步的研究作为未来的方向。
translated by 谷歌翻译
我们考虑两个马尔可夫决策过程(MDP)之间的政策转移问题。我们基于现有的加强学习理论结果(RL)引入引理,以衡量两个任意MDP之间的相对性,这是在不同的政策和环境动态上定义的任何两个累积预期收益之间的差异。基于此引理,我们提出了两种称为相对策略优化(RPO)和相对过渡优化(RTO)的新算法,它们可以分别提供快速的策略转移和动态建模。 RPO使用相对策略梯度更新策略,以转移在一个环境中评估的策略以最大化另一个环境的返回,而RTO使用相对过渡梯度更新参数化的动态模型(如果存在),以减少差异两个环境。然后,集成两种算法提供完整的算法相对策略转换优化(RPTO),其中策略同时与两个环境进行交互,从而使两个环境中的数据收集,策略和过渡更新以一个封闭的循环完成,以形成一个封闭式循环政策转移的原则学习框架。我们通过通过变体动态创建策略转移问题来证明RPTO在OpenAI Gym的经典控制任务中的有效性。
translated by 谷歌翻译